The Norwegian Dependency Treebank

نویسندگان

  • Per Erik Solberg
  • Arne Skjærholt
  • Lilja Øvrelid
  • Kristin Hagen
  • Janne Bondi Johannessen
چکیده

The Norwegian Dependency Treebank is a new syntactic treebank for Norwegian Bokmål and Nynorsk with manual syntactic and morphological annotation, developed at the National Library of Norway in collaboration with the University of Oslo. It is the first publically available treebank for Norwegian. This paper presents the core principles behind the syntactic annotation and how these principles were employed in certain specific cases. We then present the selection of texts and distribution between genres, as well as the annotation process and an evaluation of the inter-annotator agreement. Finally, we present the first results of data-driven dependency parsing of Norwegian, contrasting four state-of-the-art dependency parsers trained on the treebank. The consistency and the parsability of this treebank is shown to be comparable to other large treebank initiatives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Universal Dependencies for Norwegian

This article describes the conversion of the Norwegian Dependency Treebank to the Universal Dependencies scheme. This paper details the mapping of PoS tags, morphological features and dependency relations and provides a description of the structural changes made to NDT analyses in order to make it compliant with the UD guidelines. We further present PoS tagging and dependency parsing experiment...

متن کامل

Optimizing a PoS Tagset for Norwegian Dependency Parsing

This paper reports on a suite of experiments that evaluates how the linguistic granularity of part-of-speech tagsets impacts the performance of tagging and syntactic dependency parsing. Our results show that parsing accuracy can be significantly improved by introducing more finegrained morphological information in the tagset, even if tagger accuracy is compromised. Our taggers and parsers are t...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Building gold-standard treebanks for Norwegian

Språkbanken at the National Library of Norway is currently building up gold-standard Dependency Grammar treebanks for Norwegian Bokmål and Nynorsk. The treebanks are manually annotated for morphological features, syntactic functions and dependency relations. This paper explains the choice of texts and format of the treebanks, some key aspects of the morphological and syntactic annotation, and i...

متن کامل

تولید درخت بانک سازه‌ای زبان فارسی به روش تبدیل خودکار

Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014